Add sublayer compute function and example project for dense #62

jmduarte · 2018-05-25T17:36:23Z

This is a PR to fix the memory problem (issue #59) when unrolling large loops.

The idea is to break up the loop by partitioning the output array for each layer call.

This PR only addresses the fully connected layer.

nhanvtran · 2018-05-30T19:47:06Z

this looks like a great start. I did a quick check and the results are similar between the sublayers and layer computations -- but not exactly the same. I guess that you have to store the intermediate values and waste FFs.

Two next thoughts:

would you put these lines inside of the compute_layer function?
https://github.com/hls-fpga-machine-learning/hls4ml/blob/jmgd/sublayer/example-prjs/sublayer/firmware/myproject.cpp#L70-L72
I wonder how you can handle pruned networks. Could you add the resource allocation outside of the sublayer functions? e.g.

compute_layer(){

    //allocate multiplier resources
    #pragma HLS ALLOCATION instances=mul limit=multiplier_limit operation

    for (n sublayers){
         compute_sublayer()
    }
    merge_sublayers()
}

benjaminkreis · 2018-05-30T19:52:31Z

Regarding the point on pruning, that is worth a try. We could also switch to calculating the number of nonzero multiplications on fly like we do for the convolutional: https://github.com/hls-fpga-machine-learning/hls4ml/blob/master/nnet_utils/nnet_conv.h#L108-L109

nhanvtran · 2018-05-30T19:53:44Z

^^^ this

Maybe it's good to develop consistent machinery between conv and mlp?

jmduarte · 2018-06-11T23:13:41Z

@nhanvtran we never replied to your idea about doing loops within loops.

My feeling is that by doing the separate sublayer calls within a loop, you'll end up with the same problem (i.e. it's going to try to unroll everything).

This is why I imagined having the hls_writer just write the sublayer calls sequentially. I actually almost have the update to hls_writer done so I'll update the PR and you can see how it will work for more generic dense networks.

jmduarte · 2018-06-11T23:38:51Z

You can test this hls_writer support with the following config (for example):

KerasJson: example-keras-model-files/KERAS_dense_big.json
KerasH5:   example-keras-model-files/KERAS_dense_big_weights.h5
OutputDir: my-hls-test-sublayer
ProjectName: myproject
XilinxPart:  xcku115-flvf1924-2-i
ClockPeriod: 5

IOType: io_parallel # options: io_serial/io_parallel
ReuseFactor: 50
DefaultPrecision: ap_fixed<16,6>

The model is a big dense model: https://github.com/hls-fpga-machine-learning/keras-training/blob/master/models/models.py#L7

nhanvtran · 2018-06-12T21:45:11Z

@jmduarte writing the sublayers sequentially also works

nhanvtran · 2018-06-25T20:08:19Z

So it looks like it's working well but I'm a little concerned about how this looks to the user. Is there a way to "wrap" all the sublayer calls so it's not in the main function of the HLS project. Similarly (and probably a little more importantly) there are so many sublayer configurations that doing some fine-tuning beyond the yaml configuration looks intractable. What do you think?

jmduarte · 2018-07-05T23:09:21Z

@nhanvtran, the latest commit addresses your comment about the aesthetics.

I think the code looks more straightforward with the many sublayer calls factorized into their own functions at the bottom of the myproject.cpp top file.

Take a look (you can run the config referenced above) and let me know if it's ok.

Thanks,
Javier

nhanvtran · 2018-07-10T15:35:16Z

tested and will merge so that we can proceed to conv sublayers

Add sublayer compute function and example project for dense

ddddavid-he · 2023-01-09T10:17:17Z

ERROR: [XFORM 203-504] Stop unrolling loop 'Product1' (firmware/nnet_utils/nnet_dense_latency.h:85) in function 'nnet::dense_latency<ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config11>' because it may cause large runtime and excessive memory usage due to increase in code size. Please avoid unrolling the loop or form sub-functions for code in the loop body.

It seems that the problem still exists with the new writer.

I am converting some kind of large CNN model like this

model = Sequential([
    Conv1D(filters=5, kernel_size=5, strides=2, activation='relu'),
    MaxPool1D(pool_size=5, strides=3),
    Conv1D(filters=10, kernel_size=5, strides=2, activation='relu'),
    MaxPool1D(pool_size=5, strides=3),
    Conv1D(filters=20, kernel_size=5, strides=2, activation='relu'),
    Flatten(),
    Dense(120, input_shape=(20*15,), activation='relu'),
    Dense(64, input_shape=(120,), activation='relu'),
    Dense(2, input_shape=(64,), activation=None)
])

And the problem seems to occur in the dense layer. Is there any solution?

…ing/jmgd/sublayer Add sublayer compute function and example project for dense

marina-neseem · 2023-08-01T16:46:34Z

I am facing the same issue. I am converting a basic CNN.

model = Sequential()
model.add(Conv2D(64, kernel_size=3, activation='relu', input_shape=(28,28,1)))
model.add(Conv2D(32, kernel_size=3, activation='relu'))
model.add(Flatten())
model.add(Dense(10, activation='softmax'))

I get the same error

ERROR: [XFORM 203-504] Stop unrolling loop 'Product1' (firmware/nnet_utils/nnet_dense_latency.h:37) in function 'nnet::conv_2d_cl<nnet::array<ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, 64u>, nnet::array<ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, 32u>, config4>' because it may cause large runtime and excessive memory usage due to increase in code size. Please avoid unrolling the loop or form sub-functions for code in the loop body.
ERROR: [HLS 200-70] Pre-synthesis failed.
command 'ap_source' returned error code
    while executing
"source build_prj.tcl"
    ("uplevel" body line 1)
    invoked from within
"uplevel \#0 [list source $arg] "

Is there any solution for it?

jmduarte · 2023-08-01T18:24:23Z

hi @marina-neseem, this just means you're trying to parallelize/unroll fully, e.g. using io_parallel, the CNN operations, and there's a limitation built into the HLS compiler.

There is another dataflow scheme in hls4ml called io_stream that we typically recommend for CNNs.

See https://fastmachinelearning.org/hls4ml/details.html#i-o-types

layson-inventor · 2023-12-27T11:03:53Z

hi @marina-neseem, this just means you're trying to parallelize/unroll fully, e.g. using io_parallel, the CNN operations, and there's a limitation built into the HLS compiler.

There is another dataflow scheme in hls4ml called io_stream that we typically recommend for CNNs.

See https://fastmachinelearning.org/hls4ml/details.html#i-o-types

I try to set it to io_type='io_stream', but I get the same error.

add sublayer function and example project

d90e74a

jmduarte mentioned this pull request May 25, 2018

Memory problems when synthesizing Conv1D models #59

Closed

fix multiplier limit

917c915

jmduarte added the enhancement label Jun 6, 2018

add hls writer support for sublayer

6f5fbf6

jmduarte changed the title ~~[WIP] Add sublayer compute function and example project~~ Add sublayer compute function and example project for dense Jun 12, 2018

oops

b560bde

jmduarte and others added 2 commits July 5, 2018 18:03

factorize sublayer functions

8572e29

Merge branch 'master' into jmgd/sublayer

17555de

fix

1f1e0eb

nhanvtran merged commit c3da0e7 into master Jul 10, 2018

violatingcp pushed a commit that referenced this pull request Feb 10, 2019

Merge pull request #62 from hls-fpga-machine-learning/jmgd/sublayer

33dee62

Add sublayer compute function and example project for dense

jmduarte deleted the jmgd/sublayer branch August 4, 2021 13:52

calad0i pushed a commit to calad0i/hls4ml that referenced this pull request Jul 1, 2023

Merge pull request fastmachinelearning#62 from hls-fpga-machine-learn…

31a8f8c

…ing/jmgd/sublayer Add sublayer compute function and example project for dense

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sublayer compute function and example project for dense #62

Add sublayer compute function and example project for dense #62

jmduarte commented May 25, 2018 •

edited

nhanvtran commented May 30, 2018

benjaminkreis commented May 30, 2018

nhanvtran commented May 30, 2018

jmduarte commented Jun 11, 2018

jmduarte commented Jun 11, 2018 •

edited

nhanvtran commented Jun 12, 2018

nhanvtran commented Jun 25, 2018

jmduarte commented Jul 5, 2018

nhanvtran commented Jul 10, 2018

ddddavid-he commented Jan 9, 2023 •

edited

marina-neseem commented Aug 1, 2023

jmduarte commented Aug 1, 2023

layson-inventor commented Dec 27, 2023

Add sublayer compute function and example project for dense #62

Add sublayer compute function and example project for dense #62

Conversation

jmduarte commented May 25, 2018 • edited

nhanvtran commented May 30, 2018

benjaminkreis commented May 30, 2018

nhanvtran commented May 30, 2018

jmduarte commented Jun 11, 2018

jmduarte commented Jun 11, 2018 • edited

nhanvtran commented Jun 12, 2018

nhanvtran commented Jun 25, 2018

jmduarte commented Jul 5, 2018

nhanvtran commented Jul 10, 2018

ddddavid-he commented Jan 9, 2023 • edited

marina-neseem commented Aug 1, 2023

jmduarte commented Aug 1, 2023

layson-inventor commented Dec 27, 2023

jmduarte commented May 25, 2018 •

edited

jmduarte commented Jun 11, 2018 •

edited

ddddavid-he commented Jan 9, 2023 •

edited